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Abstract. The ergodic control problem for a non-degenerate controlled diffusion controlled through its drift is 
considered under a uniform stability condition that ensures the well-posedness of the associated Hamilton-Jacobi- 
Bellman (HJB) equation. A nonlinear parabolic evolution equation is then proposed as a continuous time continuous 
state space analog of White's 'relative value iteration' algorithm for solving the ergodic dynamic programming equation 
^s^J ' for the finite state finite action case. Its convergence to the solution of the HJB equation is established using the 

theory of monotone dynamical systems and also, alternatively, by using the theory of reverse martingales. 

b 

Key words, controlled diffusions; ergodic control; Hamilton-Jacobi-Bellman equation; relative value iteration; 
, monotone dynamical systems; reverse martingales 

AMS subject classifications. Primary, 93E15, 93E20; Secondary, 60J25, 60J60, 90C40 

u" 

1. Introduction. Consider a controlled Markov chain on a finite state space S = {1, . . . , N} 
_C ■ with transition probabilities pij(u), i,j € S, which depend continuously on a control parameter u 

that lives in a compact 'action' space U, such that when in state i the control u is chosen from a 
compact subset Uj C U. Assuming irreducibility for the stochastic matrix P" = [Pi,j(vi)] i j €S with 
v = (i>i, . . . , ujv) G (Ui x • • • x Ujv), consider the control problem of minimizing the average (or 
ergodic) cost 
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for a prescribed r:5xU^K and control sequence {Uk} such that Uk £ Ux fe and 

¥(X n+ i = j | X m ,U m , m < n) — px n j (U n ) , n > . 
The dynamic programming equation for this problem is the well known controlled Poisson equation: 



V(i) = min 

ueVi 
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This is an equation in unknowns (V, 0), with V — (V(l), . . . , V(N)J 6 l w the so called value 
function. Under the irreducibility hypothesis above, V is uniquely specified modulo an additive 
constant and j3 is uniquely specified as the optimal ergodic cost. See |DY79(lPut94] for details. 
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By analogy with the value iteration algorithm for the discounted cost problem, one may consider 
the value iteration algorithm 



V n+i (i) 
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r(i,u) - p + J2Pii(u)V n (j) 



(1.1) 



beginning with an initial guess V°(-). The difficulty here is that /3 is unknown as well. On the 
other hand, if we drop j3 from (jl.lj) . there is no convergence — the map V n h-> V n+1 = F(V n ) that 
is being iterated lacks the contractivity property of its discounted cost counterpart. Thus clearly 
some renormalization is required. The earliest example of such a relative value iteration algorithm 
for finite state Markov chains is perhaps that of White Whi63] , which is governed by 



hk+i(i) = min 
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r(i,u) + ^2pij(u)h k (j) 
3=1 



r(n, u) + }]pnj{u)h k (j) 
3=1 
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(1.2b) 



For a discussion of other possible choices for updating (|1.2b|) see ABBOlJ. 

Bertsekas introduced in |Ber98| a variation of this method that takes the form 



h k+ i(i) = min 



-(i,u) +^2pij(u)h k (j) 

3 = 1 



Afe+i = A fe + -i k h k+ i(n) . 

Here {7*:} is a sequence of positive stepsizes. This has led to the learning algorithms analyzed 
in [ABB01], Recently Shlakhter et. al. [SL KJ10] have studied ways of accelerating the convergence 
of the above value iteration algorithms. 

Studies of convergence of relative value iteration schemes for more general Markov processes 
are non-existent. The only related work that comes to mind is convergence of the value iteration in 
(|1.1[) for denumerable controlled Markov chains |AF99j . 

Our aim in this paper is to propose a relative value iteration scheme in continuous time and 
space for a class of controlled diffusion processes and prove its convergence. While we prefer to think 
of this scheme as a continuous time and space relative value iteration, it can also be viewed as a 
'stabilization of a nonlinear parabolic PDE problem in the sense of Has'minskii (sec [Has60j). We 
follow two different approaches for the proof of convergence, based on resp. the theory of monotone 
dynamical systems and the theory of reverse martingales. These should be of independent interest. 

The paper is organized as follows. The next section describes the ergodic control problem for 
diffusions and the associated Hamilton- J acobi-Bcllman equation, leading to the proposed relative 
value iteration scheme. Section [3] provides a motivating illustration from the discrete state coun- 
terpart, introduces some notation, and recalls some key results from parabolic PDEs and monotone 
dynamical systems for later use. Section @] gives the two convergence proofs alluded in the Abstract, 
while Section [5] concludes with some pointers to future work. 
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2. Problem statement. 

2.1. The model. We are concerned with controlled diffusion processes X = {X t , t > 0} 
taking values in the <i-dimensional Euclidean space R d , and governed by the ltd stochastic differential 
equation 

dX t = b(X t ,U t )dt + a{X t )dW t . (2.1) 

All random processes in (|2.1I) live in a complete probability space (17,5, P). The process W is a 
(i-dimensional standard Wiener process independent of the initial condition Xq. The control process 
U takes values in a compact, metrizable set U, and Ut{ui) is jointly measurable in (t, uS) G [0, oo) x Q. 
Moreover, it is non-anticipative: for s < t, Wt — W s is independent of 

3s = the completion of a{Xg, U r , W r , r < s} relative to (3, P) • 

Such a process U is called an admissible control, and we let il denote the set of all admissible controls. 

We impose the following standard assumptions on the drift b and the diffusion matrix cr to 
guarantee existence and uniqueness of solutions to (|2.1[) . 
(Al) Local Lipschitz continuity: The functions 

b = [b\ . . . , b d ] T : R d x U ^ R d and a = [a lj ] : R d i-> R dxd 

are locally Lipschitz in x with a Lipschitz constant kr depending on R > 0. In other words, 
if Br denotes the open ball of radius R centered at the origin in R d , then for all x, y E Br 
and u G U, 

\b(x,u) - b(y,u)\ + \\u(x) - a(y)\\ < k r \x - y\ , 

where ||ct|| 2 = trace (crcr T ). 
(A2) Affine growth condition: b and cr satisfy a global growth condition of the form 

\b(x,u)\ 2 + \\a(x)\\ 2 < ki(1 + \x\ 2 ) V(x,u) eK d x U. 

(A3) Local non-degeneracy: Let a = iffff T . For each R > 0, we have 

d 

foralU = (a,...,^)eK d . 
We also assume that b is continuous in (x,u). 
In integral form, (|2.1[) is written as 

X t = X + f b(X 8 ,U s )ds+ f a(X s )dW s . (2.2) 
Jo Jo 

The second term on the right hand side of (I2.2[) is an Ito stochastic integral. We say that a process 
X = {X t (uj)} is a solution of (|2.1j) . if it is 3t-adapted, continuous in t, defined for all cj G f2 and 
t G [0, oo), and satisfies (|2.2I) for all t G [0, oo) at once a.s. 
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With mgU treated as a parameter, we define the family of operators L u : C (Mr) M> C(R ) by 
L-f( x )=^( X )^l-( x )+J2b\x,u)^( X ), ueV. (2.3) 



We refer to L u as the controlled extended generator of the diffusion. 

Of fundamental importance in the study of functionals of X is Ito's formula. For / € C 2 (M d ) 
and with L u as defined in (12.31). 



f(X t ) = f(X )+ f L u °f(X s )ds + M t , a.s., (2.4) 
Jo 

where 

M t = I (Vf(X s ),u(X s )dW s ) 
Jo 

is a local martingale. Krylov's extension of the Ito formula |Kry80| p. 122] extends (|2.4p to functions 
/ in the local Sobolev space Wf^(M d ). 

Recall that a control is called Markov if Ut = v(t,X t ) for a measurable map u : I x 4 D, 
and it is called stationary Markov if v does not depend on t, i.e., v : K d i— > U. Correspondingly, the 
equation 

X t = XQ + f b(X s ,v(s,X s ))ds + f u(X s )dW s (2.5) 
Jo Jo 

is said to have a strong solution if given a Wiener process (W*,^) on a complete probability space 
(f2, 3?, P), there exists a process X on (fi, J?, P), with Ao = xo G R d , which is continuous, 3t-adapted, 
and satisfies (|2.5[) for all t at once, a.s. A strong solution is called unique, if any two such solutions 
X and X' agree P-a.s., when viewed as elements of C([0, oo), R d ) . It is well known that under 
Assumptions (A1)-(A3), for any Markov control v, (|2.5I) has a unique strong solution [GK96 . 

Let ilsM denote the set of stationary Markov controls. Under v G Hsm, the process X is 
strong Markov, and we denote its transition function by P%(x, •). It also follows from the work 
of BKR01,Sta99 that under v G ilsM, the transition probabilities of X have densities which are 
locally Holder continuous. Thus L v defined by 

uf{ X )^Y. al3 i x )^^^)+Y. bl ^< x ^^ x )^ « eil sM, 

i,j 1 J j l 

for / G C 2 (M. d ), is the generator of a strongly-continuous semigroup on Cf,(M d ), which is strong Feller. 
We let P^ denote the probability measure and E^J the expectation operator on the canonical space 
of the process under the control v G Hsm, conditioned on the process X starting from x G M d at 
t = 0. 

2.2. The ergodic control problem. Let r:l d xD->lbea continuous function bounded 
from below, referred to as the running cost. As is well known, the ergodic control problem, in its 
almost sure (or pathwise) formulation, seeks to a.s. minimize over all admissible U G it 

i r* 

limsup - / r(X s ,U s )ds. (2.6) 
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A weaker, average formulation seeks to minimize 



lim sup 

t— )-oo 



- [ E u [r(X s ,U s )]ds. (2.7) 
1 Jo 



We let B be denned as 



1 r* 

8= inf Umsup - / E u [r(X s , U s )] ds , 



(2.8) 



i.e., the infimum of (|2.7|) over all admissible controls. 

We assume that the running cost function r : l^xU-J M + is continuous and locally Lipschitz in 
its first argument uniformly in u G U. Without loss of generality we let kr be a Lipschitz constant 
of r over Br, i.e., More specifically, for some function K c : R + — >• M + , 

\r(x,u) - r(y,u)\ < k r \x - y\ Vx,y G Br , VueU, 

and all i? > 0. 

We work under the following stability assumption: 

Assumption 2.1. There exists a nonnegative, inf- compact V : R d — >• R and positive constants Co, 
c\ and Ci satisfying 

L u V{x) < c - ciV(x) Vu S U (2.9a) 
sup r(x,it) < c 2 V(a;) (2.9b) 

/or a/? x G M. d . Without loss of generality we assume V > 1. 

It is well known (see |ABG111[GS72] 1 that (|23a|) implies that 

[V(Xt)l < — + V(x)e- Clt Va; G M. d , VJ7 G it . (2.10) 

ci 

Recall that control v G ilsM is called stable if the associated diffusion is positive recurrent. We 
denote the set of such controls by IIssm- Also we let \x v denote the unique invariant probability 
measure on R d for the diffusion under the control v G IIssm- It follows by (12.10)) that, under 
Assumption 12.11 all stationary Markov controls are stable and that 

V{x) fi v (dx) < — . 

■ Cl 

Let Cy(R d ) denote the Banach space of functions in C(M d ) with norm ||/||v — su Px6K d | | ■ 
Recall that a skeleton of a continuous-time Markov process is a discrete-time Markov process with 
transition probability P — J °°a(dt)P*, where a is a probability measure on (0,oo). Since the 
diffusion is non-degenerate, any skeleton of the process is </>-irreducible, with an irreducibility measure 
absolutely continuous with respect to the Lebesgue measure. It is also straightforward to show that 
compact subsets of R d are petite. It then follows that for any v G IIssm the controlled process under 
v is V-geometrically ergodic (see DMT95 , FR05 ) , or in other words there exist constants Co and 
7 > such that if h E C v (R d ) then 



P*h(x) — / h(x) fi v (dx) 



<C e-^\\h\\V{x), t>0,xER d . 
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Concerning the ergodic control problem the following result is standard |ABGllj . 
Theorem 2.2. Under Assumvtion \2.1\ there exists a unique solution V* G Cy(R d ) nC 2 (K d ), 
satisfying V*(0) = 0, of 

= min [L u V*{x) + r{x, u)]-f3. (2.11) 



A control v* G Hsm is optimal with respect to the criteria (|2.6I) and (|2 . T[) if and only if it satisfies 

dV 

dx. 



mm 



d dV 
22 bt ( x >u) — (x) +r(x,u) 

i—1 



J2b i {x,v*(x))^-(x) + r{x,v*(x)) (2.12) 



i=l 



a.e. in K . 

For the rest of the paper v* £ Hssm denotes some fixed control satisfying (|2.12l) . 

2.3. The relative value iteration. We study the following relative value iteration (RVI) 
scheme: 

dV 

— (t,x) = min \L u V(t,x) +r(x,u)} -V(t,0), V(0, x) = V (x) , (2.13) 

Ot JiGO ' " ' 

with the boundary condition V G C v (R d ) n C 2 (R d ). 
The main theorem of the paper is as follows. 

Theorem 2.3. For each V € C v (R d ) n C 2 (R d ), the solution V(t,x) of (|2~T3)l converges to 
V*{x) +(3 as t ->■ oo. 

The proof of convergence of (|2.13p is facilitated by the study of the value iteration (VI) equation 
dV 

— (t, x) = mm [L u V(t, x) + r(x, u)] - [3 , V(0, x) = V (x) . (2.14) 

Here V G C v (R d ) n C 2 {R d ) as in ([2~T3]) . Also (3 is as in (|2T5| . so it is assumed known. 

As shown in Lemma[0]in Section|H V(t, ■) is bounded in Cy(R d ) uniformly in t > 0. By (|2.14[) 
we have 

V'(t,x)=ig i (f E^[r(X s ,U s )-f3}ds + E^[V (X t )] \ . (2.15) 



Also, as we show in Lemma r4.4[ 

V(t, x) = V(t, x) - e _t / e s ^(s, 0) ds + (3(1 - e" 4 ) Vs e K d , t > . 



It follows that V(t, ■) is also bounded in Cy(R d ) uniformly in t > 0. Additionally, convergence of 
V"(t, •) as t — > oo implies the analogous convergence of V(t, ■). In Section|4]we provide two separate 
proofs of convergence of V(t, ■) as t — » oo to a solution of ()2.11j) . The first employs results from the 
theory of monotone dynamical systems, while the second utilizes a reverse martingale convergence 
theorem. 

Remark 2.4. Note that by (|2.15[) convergence ofV(t, •) as t — > oo to a solution of (|2.11j) implies 
that Fit, ■) defined by 

F(t, x) = inf f ¥E MX,, U s ) - [3] ds 
ueu Jo 
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also converges to a solution of the HJB equation in (|2.11[) . 

Note also that the (VI) provides a sharp bound for the performance of an optimal ergodic control 
v* over a finite horizon. Indeed, by (|2.11l) . we have 

ft 



V*(x)=El 



(r(X s ,U s )-[3) ds 



o 



+ K [v*(x t )] 



Therefore, by (|2.14[) with boundary condition Vq = 0, we obtain 

K [r(X t ,U t )} inf f\ u x [r(X t ,U t )} = V*(x) -Ef [V*(X t )} - V(t,x) , (2.16) 
o uen Jo 

and the infimum is realized by any measurable selector from the minimizer of the (VI). Since the 

right hand side of (|2 . 16|) is bounded in Cy(R d ) uniformly in t G [0,oo), it follows that, under 

Assumvtion \2. 1[ a stationary Markov average-cost optimal control v* satisfies 

[ if [r{X t , U t )] dt < K V(x) + inf f E^ [r(X t , U t )] dt VT > . 
Jo Ueil Jo 

This provides a sharp bound for bias and overtaking optimality over the class of all Markov controls 

( compare with the results in 'JFHL091 which are restricted to the class of optimal stationary Markov 

controls). 

3. Preliminaries. 

3.1. A Result from Monotone Dynamical Systems. Let H be a subset of a metric space 
y of real valued functions defined on a set X. Suppose also that H is a subset of a Banach space 
Q with a positive cone Q+ which has a nonempty interior. Let < be the natural partial order on T-L 
relative to the positive cone of Q + . In other words, for h, h' G % we write h ■< hf if h' — h £ G+ 
for all x € X. We also introduce the relation ^< and write h ~<A h' if h' — h E int(C? + ), where 'int' 
denotes the interior. 

Let $ : H x K + — > H be a semifiow on H. In other words, 4> satisfies 

(i) $ (/i) = h for all h e H; 

(ii) $(0$ s = $ t+s for all t, s € R+. 

As well known, if h e U, then its orbit 0{h) is defined by 0(h) = {<$> t {h) : t > 0}. Also 
the u -limit set of h € % is denoted by w(/i) and defined as = (~lt>o U s >t $t(/i), where the 

closure is in y. The semifiow is called monotone (strongly monotone) if h ^ h! (h -< h') implies 
that $t(/i) ^ $t(h') (®t(h) -« $t(/i')) for all £ > 0. It is called eventually strongly monotone if it is 
monotone and whenever h -< h' there exists some to £ K+ such that 3>t (/i) ^< $t (/i'). As shown 
in |Smi95| Proposition 1.1], if $ is eventually strongly monotone then it is strongly order preserving 
(SOP), and this means that whenever h -< h' there exist open neighborhoods U and f ' of /i and h', 
respectively, and to > such that &t(U) ■< ^t(U') for all t > to- 
Let 

£ = {heH: ^t(h) = h, Vt > 0} . 

In other words, £ is the set of equilibria of the semifiow. A point h G H is called quasiconvergent 
if w(/i) C £, and convergent if is a singleton. Let Q and £ denote the sets of quasiconvergent 
and convergent points, respectively. 
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We quote the following theorem |Smi951 Theorem 4.3 and Remark 4.2] which shows that quasi- 
convergence is generic. We need the following notation: We write h n tt h (h n 4-4- h) if h n -< h n+ i 
(h n >- h n+ i) and lim„ h n — >■ h in H. 

Theorem 3.1. Let <I> t be a strongly preserving semiflow on H C Q. Suppose that 

(i) For any h G % there exists a sequence {h n } C H such that h n ff h or h n 4-4- h. 

(ii) For each heU the closure of 0(h) is a compact subset ofH. 

(hi) If {h n } C % is such that h n ff h or h n 4-4- h, then {U n( =N w(/i n )} /ias compact closure in 
y which is contained in T-L. 

Then ~H = int(O) U int(C). Moreover, if £ is totally ordered with respect to ^, then Q = € which 
implies that W = int((£). 

3.2. The case of continuous time controlled Markov chains. To illustrate our approach, 
we consider here the simple case of a controlled Markov chain with state space S in continuous time, 
with 'rate matrix' Q u = [Qij{u)\, i,j s S, depending continuously on a parameter u that lives in a 
compact action space U. The matrix Q satisfies qij > for all i ^ j and X^eS 9*3 ~ ^" Suppose 
first that the state space is finite, i.e., S = {1, . . . , N}. To guarantee irreducibility we assume that 
there exists an irreducible rate matrix Q = and a constant <5 > such that qij(u) > 5q~ij for 
all i j and u G U. Let r:SxD->lbea running cost. The solution of the ergodic control 
problem has the following characterization: There exists a unique pair (V*,f3) with j3 a constant 
and V* : S ->• K, satisfying V*(N) = 0, which solve with V = V* the equation 



mm 



(i VieS. (3.1) 



Moreover a stationary Markov control v = (vi, . . . , ujv) is average-cost optimal if and only if it is a 
selector from the minimizer in (|3.1[) . Expressing r in vector form as r(u) = (r(l, u), . . . , r(N, u)) , 
the relative value iteration algorithm takes the form of the following differential equation in R N : 

^ = min \Q{u)h + r(u)] - lh N (t) , h(0) = ge R N . (3.2) 

dt uev 

where 1 indicates the vector whose components are all equal to 1. Showing existence of solutions 
to (|3.2I) is straightforward. One can follow for example the method used in the proof of Lemma 14.11 
which appears in Section |4j The corresponding value iteration equation is 

^- = mm \Q(u)h + r{u)] -1/3, h(0)=geR N . (3.3) 
dt ii£U 

We apply Theorem 13.11 to (|3.3p . Here T-L and Q are isomorphic to M. N under the Euclidean norm 
topology. Hence the partial ordering is h ^< h' <^=> hi < h! i for all i € S. The fact that (|3.3j) is 
strongly order preserving follows from the irreducibility of the chain. Hypothesis (i) of Theorem l3.ll 
is obviously satisfied in H ~ R N . Since the solution of ([3T3J) is uniformly bounded for any initial 
condition with the bound depending continuously on the initial condition g 7 it follows that hypotheses 
(ii) and (hi) of Theorem 13. II are satisfied. The equilibrium set £ of (|3 . 3|) is the set of V G M. N which 
solve (|3.ip . Hence £ = {V* + c : c G R}, which is a totally ordered set. It then follows from 
Theorem 13.11 that H — int(£). It is also straightforward to show from (|3.3p that the solutions are 
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continuous with respect to the initial condition, uniformly in t G [0, oo), i.e., that if g n is a sequence 
converting g G M. N as n —¥ oo, then 

sup|$ t (<?")-$ t ( 5 )| >0. 

t>0 n ^°° 

As a result, £ is closed and hence every initial condition is convergent point. By (|3.2p - (|3.3[) and 
following the argument at the end of Section [47T1 for the proof of Theorem 12.31 it follows that h(t) 
converges to V* + f3. Convergence of the relative value iteration for countable state space Markov 
chains in continuous time follows along the same lines, provided a Lyapunov hypothesis analogous 
to (I2.9a[) is imposed, as well as appropriate assumptions to guarantee the regularity of the process. 
We don't delve into these details, since the focus in this paper is continuous state space models. 

3.3. Notation and Background. The term domain in M. d refers to a nonempty, connected 
open subset of the Euclidean space K d . We introduce the following notation for spaces of real- valued 
functions on a domain D C R d . The space C P (D), p G [l,oc), stands for the usual Banach space 
of (equivalence classes) of measurable functions / satisfying J D \f(x)\ p dx < oo, and C°°(D) is the 
Banach space of functions that are essentially bounded in D. The space C k (D) (C°°(D)) refers to the 
class of all functions whose partial derivatives up to order k (of any order) exist and are continuous. 
The standard Sobolev space of functions on D whose generalized derivatives up to order k are in 
C P (D), equipped with its natural norm, is denoted by W k ' p (D), k > 0, p > 1. 

We adopt the notation d t — J^, and for i,j G N, di = J^- and = d ®. dx . - We often use the 
standard summation rule that repeated subscripts and superscripts are summed from 1 through d. 

3.4. Some Facts from Parabolic Equations. For a nonnegative multi- index a — (a l7 . . . , aj) 
we let D a = 9" 1 • • -d% d . Let Q be a domain in R+ x R d . Recall that C r ' k+2r (Q) stands for the set 
of bounded continuous functions ip(t, x) defined on Q such that the derivatives D a d^tf are bounded 
and continuous in Q for 

\a\+2e<k + 2r, £<r. (3.4) 
For ip E C r ' k+2r (Q) and p G [1, oo), define 

II II — II r> a f)£ II 

\a\<k+2(r-e) 

The parabolic Sobolev space W r ' k+2r ' p (Q) is the subspace of C P (Q) which consists of those functions 
(p for which there exists a sequence ip n in C r ' k+2r (Q) such that \\ip n — f\\ CP ^ ^ as n -> oo and 

\\D a d e tVn - D a dfa m \\ c > , 

n,m— j-oo 

for all a and £ satisfying (13.41) . In this way the Sobolev derivatives D a dfip are well defined as 
functions in C P (Q) and W r ' k+2r ' p (Q) is a Banach space under the norm introduced. 

Let r : R d x U be a nonnegative continuous function which is locally Lipschitz continuous in x 
uniformly in u G U. Let kr be a Lipschitz constant of r over Br. 
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We next review some standard estimates for solutions of equations of the form 

- dtip(t, x) + min [L u ip(t, x) + r(x, u)] = f(t, x) (3.5) 

and 

- Bt<p(t, x) + L Vi <p(t, x) = g(t, x) . (3.6) 



Note that if v is a measurable selector from the minimizer in (|3.5|) then the quasilinear equation 
(|3.5p transforms to the linear equation (|3.6I) . which in fact takes the particular form 

-d t <p(t,x)+a ij d ij <p(t,x)+H(D<p,x) = f(t,x), (3.7) 

where H is Lipschitz continuous in its arguments. 

For R > and < T < T define B T R ' T = (T',T) x B R . Let g G W°^p(B° b t ) and suppose 
that ip £ W^ 2 -p(S°' T ) is a solution of j331). Th en for any R' G (0, i?) and T' G (0, T) it holds that 
<p G W 1 ' 2+fe ' p (B^, ,T ) and there exists a constant Cj = C X {R' , R, T' , T, k, d, k r , Ki,p) such that 



IMI w i. a+ »,p( fl £.*) ^ C i ^Pll w o.**( B o.T) + IpI^^t) J • (3-8) 

Combining (]3 .8|) with the compactness of the imbedding of W 2:P (_Br) ^(Br), for p > d, and the 
interpolation inequality, we conclude by using pTfjl that if / G W°' 1 *(b£ t ) ) then p G C 1 ' 2 ^^) 
and 



m K x 2 S ^ P T - C2 ( PLo.*„r H o.^ + IMUo^ ) > ( 3 - 9 ) 

B H'' T 



where Ci depends on the parameters in C\. Moreover, if the derivatives dif are bounded on B R ' T 
then 

max sup \D a d t <p\ < C 3 I max sup \D a f\ + \\f\\ ( , T \ + \\ip\\ ( . T \ \ . (3.10) 

Q <1 „T',T \ \a <1 „T'.T VV V H / \ rt ) 

B R> \ B R> 

These estimates can be found in Kry08, Chapter 5]. 
4. Main Results. 

4.1. Proof of Theorem 12.31 The proof of Theorem 12.31 involves several intermediate results. 
For a subset Q of E + x K d , by abuse of notation, we let Cy(Q) denote the Banach space of functions 
in C(Q) with norm 



|v = sup 

(t,x)eQ 



I f(t,x) I 
I V(x) I 



Let Rj, = [0, T] x R d . We next show that (j2~13l) has a unique solution in C v (Mt) nC 1 ' 2 (R^), for any 
T > 0. 

Lemma 4.1. For each V € C v (K d ) n C 2 (R d ), t/iere ezisfc a unique solution V G Cv(Mt) n 



C 1 - 2 ^), /or any T > 0. 
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Proof. We first show that if g : [0, T] — ► R d is a bounded continuous function, then 

d t ip(t, x) = min [L u ip(t, x) + r(x, u)} - g(t) , <p(0, x) = V (x) (4.1) 

has a unique solution in Cy(R^) n C^ 2 (R^). 

Let r n denote the truncation of r, i.e., r n (c,u) =nA r(x, it). Let denote the first exit time 
from the ball of radius R centered at the origin in K d , and let ipR '■ R d —> [0, 1] be a smooth function 
which satisfies ip R {x) — 1 for \x\ < R /2 and ip R (x) — for \x\ > Then the boundary value 

problem 



<Pn,R.{t,x) = inf E^ 
Ueu 



(4.2) 



(4.3) 



<k<Pn,R(t, x) = min [L u (p n R (t, x) + r n (x, u)} - g(t) , 

fn,R(°' X ) = Vo(x)lp R (x) , <Pn,R(t, ■ )\dB R = Vt > , 

has a unique solution in C 1,2 (M^). This solution has the stochastic representation 

V (X t )ij} R (X t )l{t < x fl } + f (r n (X s , U s ) - g(s)) ds 

Jo 

where I denotes the indicator function. Since 

f E^ [r n (X s , U s )} ds<c 2 f E^ [V(Xs)} ds 
Jo Jo 

<—(c t + V(x)) Weil, 

Cl 

and V G Cy(R d ), we obtain 

VnAt, x) < c 3 (l + V(x)) + ^ (c t + V(.x)) + ||g|Ui ([ o,T]) (4.4) 
for some constant C3 > 0. Also by (|4.3p we have 

x) > -c 3 (l + V(s)) - / <?0) , (4.5) 



and it follows that for any fixed g and Vb, the solution ip n ,R is bounded in Cy(Ry) uniformly in 
i? > and n G N. The interior estimates of solutions of (I4.2j) (see [LSU671 p. 342 and p. 351]) allow 
us to take limits as R — > 00 (along some subsequence) to obtain a solution tp n G C 1:2 (R^) to 



d t (fn(t,x) =min [L u ip n (t,x) + r n (x,u)} - g(t) , <p n (0, x) = V (x) , (4.6) 

which naturally satisfies the bounds in (|4.4j) - (|4.5[) . Using again the interior estimates of solutions to 
(|4.6p we can let n -> 00 to obtain in the limit a solution (p G Cv(Rt) n C 1,2 (Rt) to (|4.1|) . Showing 
uniqueness of this solution is standard. Let </? and ip' be such solutions of (|4.1[) corresponding to 5 
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and g', respectively. Using the inequality inf A — inf B\ < sup \A — B\ we have 



sup \ip(t, 0) — >p'{t, 0)| < sup 
te[o,T] *e[o,T] 



< sup 

t£[0,T] 



inf E 

ueix 



u 



V (X t )+ / (r(X s ,[7 s )- 9 (s))d.' 



- inf En 
UeU 



V (X t )+ / (r(X s ,U s )-g'(s))ds 



[\g(s)-g'(s)] ds 

JQ 



<T sup \g(t)-g'(t)\. 
te[o,T] 

Hence for T < 1 the map <?(•) H> </?(•, 0) is a contraction thus asserting the existence of a solution to 
(|2TT3)) in Cy(M^) nC 1 - 2 (Rf,), for T < 1. Concatenating intervals [0,T], [T,2T], . . . , with T < 1, we 
obtain such a solution of (I2.13P for any T > 0. Uniqueness is again standard. □ 

The next two lemmas concern estimates for the solutions of the (RVI) and the (VI) . 

Lemma 4.2. For each V a E C v (R d ) n C 2 (R d ), the solution V of ([2~T4]) satisfies the bound 



\V*(x) - V(t,x)\ < \\V* 



Vn 



CO 



— + V(a;)e- Cl, 



Cl 



Vx G K d , Vi > . 



(4.7) 



Proof. Let i>* be a measurable selector from the minimizer in (|2.11[) . Then 

-d t {V* -V) + L V *{V* -V)<0 (4.8) 
from which, by an application of Ito's formula to V*(X S ) — V(t — s, X s ), s G [0, t], it follows that 

Ef [V*(X t ) - V (X t )} < V*(x) - V(t,x) . (4.9) 
On the other hand, if v is a measurable selector from the minimizer in (|2.14[) . then 

-dt(V* -V) + L B {V* - V) > , 

and we obtain 

V*(x) - V{t,x) < El [V*(X t ) - Vo(X t )} . (4.10) 

Since V* and V„ are in C v (R d ), g2J follows by (|2~TU|) and (|431) - (|CT)|) . □ 

Remark 4.3. A^oie i/iai </ie Markov control associated with a measurable selector v from the 
minimizer in (|2.14[) is computed 'backward' in time. Hence the control applied to the process X 
considered in (|4.10[) is the Markov control U(s, x) — v(t — s, x), < s <t, where v solves 

d t V(t, x) = a rj (x)d i:j V(t, x) + b* (x, v(t, x))diV(t, x) + r(x, v(t, x)) - [3 . 



Lemma 4.4. IfV(0,x) = V(0,x) = V (x) for some V G C v (R d ) n C 2 (R d ), then the solutions V 
and V of (|2 . 13[) and (|2 . 14[) . respectively, satisfy 

V(t, x) - V(t, 0) = V(t, x) ~ V(t, 0) (4.11) 

ft 



V(t,x) = V(t,x) 



e s U(s,0)ds + ^(l-e-*) 



(4.12) 
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for all x G R d and all t > 0. 

Proo/. By and (f2~TI|) we have 

V(t,a;)=inf ([ E% [r(X s ,U s )] ds + [V (X t )]\ - [ V(s,0)ds (4.13a) 
c/eU \Jo / Jo 

V(t, x) = inf ^ E^f [r(X s , U s )] ds + E u x [V Q {X t )] \ ~ pt . (4.13b) 

Hence P~lTT) follows by (I4.13a|) - (|4.13b|) . Again by (I4.13a|l - (|4.13b[) we have 

V(t, 0) - p + [ (V(s, 0) -P)ds = V(t, 0)-P, (4.14) 
Jo 

and solving (14. 14|) we obtain 

V(t, 0) = V(t, 0) - e~* / e s V(s, 0) ds + /3(1 - e _t ) , 
Jo 

which combined with (|4.11[) yields (|4.12p . □ 

Next we show that the solution V" of the (VI) converges as t — > oo for any initial condition Vb . 

Theorem 4.5. For eacft V G C v {R d ) nC 2 {R d ), V(t,x) +c as i ^ oo, /or some ceM 

which depends on Vq. 

Proof. We view the solutions of (1244)) as a semiflow on % = y = C v (R d ) n C 2 (R rf ), also letting 
C/ = Cy(R rf ), and apply Theorem l3.ll We equip Cy (R d ) nC 2 (R d ) with a complete metric, for example 
by letting 

d(f,g)= \\f ~ 9\\ evm +J2 ^^(}>\\f ~ 9\\chb„)) > 

n=l 

where B n denotes the ball of radius n centered at the origin in R d and 

l|/|U(B) 4 E S "P 1^1- 

|a|<2 S 

Hypothesis (i) of Theorem I3TT1 is clearly satisfied. Let $ t (Vb) : M d ->• M denote the solution of (12.44)) 
corresponding to Vb £ C v (K d ) nC 2 (R d ). Let £ = {V* + c : c G R} C C v (R d ) n C 2 (R d ), i.e., the set 
of equilibria of this semiflow. Note the following: 

(a) for each Vb G Cy(M d ) n C 2 (K d ), $ t (V ) is bounded in C v (R d ) by (jlTF]) . Also the second 
order partial derivatives of $t(Vb) are locally equicontinuous in a;, uniformly in t > T 
for some T > (this requires a slight improvement of (|3.9[) . adding Holder continuity 
which is standard I.Sl iTTi Theorem 5.1]). Hence, every subsequence $ t?i (Vb) contains a 
further subsequence that converges in Cy(R d ) nC 2 (R d ), which, in turn, implies that the 
orbit {$t(V ) : * £ M+} has a compact closure in C v (R d ) D C 2 (W l ). 

(b) If {V Q n } C C v (R d ) n C 2 {R d ) is a monotone sequence such that V " — >• Vb G Cy(K d ) n C 2 {R d ) 
as n -> oo, then by (|477|) the set {U„ eN $t(V n ) : i £ R+} is bounded in Cy(R d ). Hence it 
has locally Holder equicontinuous second order partial derivatives in x, which implies that it 
has a compact closure in Cy(R d ) DC 2 (R d ). In particular, the set {U„ e N w (^o")} has compact 
closure in C v (R d ) D C 2 (R d ). 
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Hence assumptions (ii) and (iii) of Theorem 13.11 are satisfied. 

Consider the partial order relation ^< on Cy(K ) DC 2 (M. d ) induced by the positive cone C\>(M. d ) + 
in C v (W l ). Indeed, if V Q ~< V ' then (l2~T4) yields 

E£ # [V '(X t ) - V (X t )\ < $ t (yo)(x) - MVo){x) Vt > , Vic G R d , (4.15) 

where «' is a Markov control associated with a measurable selector from the minimizcr in (|2.14[) 
corresponding to the solution starting at V ' (see Remark 14. 3j) . It follows from (|4.15|) and the fact 
that the support of the transition probabilities of the controlled process is the entire space M. d 
that if Vq -< Vq, then $t(Vb) -« ^t{V ') for all t > 0, or in other words that the semiflow $ is 
strongly monotone on C\>(M. d ) n C 2 (R d ). As mentioned in Section |3~T1 the semiflow is then strongly 



order-preserving. Since £ is totally ordered it follows by Theorem 13 . 1 1 that % = int(C). 
It remains to show that £ is closed. Note that 

\$t{V ){x) - W)(z)| < sup \¥F X [V (X t ) - V '(X t )}\ 

veil 



< \\V -Vi\\ v { sup E u x [V(X t )} ) , t>0, 
\Ueu 



Hence by ()2.10|) we have 



\\* t (V ) -* t W)\\ v < (sup sup \\vo-v>\\ v 

<( 1 + ^)ll^o-^|| v , *>0. 

This shows in particular that if V n is a Cauchy sequence of convergent points in Cy(R d ) n C 2 (R d ), 
then /„ = ui(V Q n ) converges in C\>(M. d ) as n — >• oo. Suppose that Vq G £ c . Since £ is dense, there 
exists {V n } C £ such that V n -> V as n -)• oo. Let / = lim,^^ cj(V" "). Since V G £ c , 
then limsup^^ d($ t (Vb),/) > 0. Moreover, since for some T > the set {$ t (Vb) : t > T} 
is precompact in C v (R d ) n C 2 (M d ) there exists /' G C v (K d ) n C 2 (M d ), / ^ /' and a sequence t' n 
such that $t^(Vb) — > f as n — > oo. On the other hand, we can find a sequence t n such that 
sup t>4n ||$t(^o") — /|| v ^ in Cy(R d ) (~l C 2 (R d ) as n — > oo. Therefore, for some subsequence 
n(k) t oo, we have ||^t n(t!) (V^ ) — /|L — > as fc — > oo. Therefore, 

0<l|/'-/|| v 
= fc Hm \\®t nW (V )-$ tn(k) (V k )\\ v 



! 1 + -) Jim ||Vb -V % 



yielding a contradiction. Thus we have shown that all points of Cy(R d ) nC 2 (R d ) are convergent and 
the proof is complete. □ 
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We are now ready for the proof of the main result. 

Proof of Theorem\£E If we define g(t) = V(t,x) - V(t,x), then by (ETT2"j) we have 

g(t)= f e-'^-V^O)) ds 
Jo 

Since V(t, x) —> V* (x) + c as t — >• oo for each x G R d by Theorem l4.5[ it follows that g(t) converges 
to (3 - V*(0) - c = (3 - c as t — > oo. Hence 

lim V(t, x) = lim [g(t) + V(t, x)} = V*(x) + p VieR d , 

and the proof is complete. □ 

4.2. An alternate proof of Theorem 14.51 Recall that v* is an optimal stationary Markov 
control. Let fj, v * be the corresponding invariant probability distribution, and let t G R, be a 
stationary solution of (12. ip under the control such that the law of X% is for all i 6 1. Let 
1st = o-(X s : — oo < s < t) and 

V(t,x) =V(t,x) -V*{x). 

By (|4.8[) we have 

-8t9{t,x) +L v '^(t,x) > 0. 

Therefore the process 

M t ±V{t,X*_ t ), te[0,oo), 

is a reverse (g_ t )-supermartingale. Also, by (|4.7|) there exists a constant Co such that E [\M t \] < Co, 
for all t G [0, oo). We argue by contradiction. Suppose that V(t, ■ ) — V*( ■ ) does not converge to a 
constant as t — ?> oo. Then, there must exist constants a < b, a ball i)Cl <i and a pair of sequences 
{t n } C R+ and {x n } cD,n£ N, such that 

*(*2k-i, ar 2 *-i) < a , *(*2 fe , ^2fc) > & Vfc G N . (4.16) 

Let B r {x) denote the open ball of radius r centered at x G R d . Since V( ■ , t) - V*{ ■ ) is uniformly 
equicontinuous on any bounded domain, there exists r > 0, such that if x G D, then 

|¥(i j!B )-$(t,i,)|<^ VyeB 2r (x). (4.17) 

Let S = {Gi : 1 < i < TV} be a finite open cover of Z? with balls of radius r. Since 9 is finite, an 
infinite number of terms of the sequences {x2fc-i : k G N} and {x2k '■ k G N} lie in some elements 
G' and G" of 9, respectively. Dropping to a subsequence of {xk}, which is also denoted as {xk}, it 
follows by l[4TT6]l - (|iTf]l that 

(4.18) 

*(*2*,a:)>6' Va; G G" 
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for all k = 1,2,.... Without loss of generality we can also assume that the time sequence t n satisfies 
i n+ i — t n > 70 > 0. The convergence of the transition probability under the control v* implies that 
for some constant £o > 

Pf (X* €B r (y)) >s \/x,yeD, Vi > 7 o • (4.19) 



It follows by dHHJl-dHTHJl that 




Therefore if v is the number of upcrossings of [a', b'} by M then (|4.18|) and (|4.20[) imply that 
P" [y < oo) = 0. However by the reverse submartingale upcrossings inequality [v] < oo which 
gives a contradiction and the proof is complete. 

5. Conclusions. We have proposed a nonlinear parabolic PDE that serves as a continuous 
time, continuous state space analog of the relative value iteration scheme for solving the ergodic 
dynamic programming equation in finite state problems. This was done under a uniform stability 
condition in terms of an associated Lyapunov function. 

These results suggest several future directions: 

1. An important class of ergodic control problems is one wherein instability is possible, but 
is heavily penalized by using a 'near-monotone' (sec ABG11, Chapter 3] for a definition) running 
cost. It would be both interesting and important to extend the above results to this case as it covers 
several important applications. 

2. While the foregoing seems to extend easily to two-person zero-sum stochastic differential 
games with ergodic payoffs, it would be of great interest to do the same for interesting classes of 
non-cooperative games with ergodic payoffs. 

3. Rate of convergence results, computational aspects, and convergence under subgeometric 
ergodicity are also open issues. 
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